Towards Image Recognition using Social Media
نویسنده
چکیده
We consider the problem of image recognition in social media websites. Social-media website's users have been doing image recognition task unconsciously all the time, such as, tagging images or looking at images. We would like to utilize this valuable human's effort. Thus, we propose a preliminary image recognition system trained using images scraped from the internet, and an iOS application that allows users to easily label images, and the application uses the newly labeled images to improve the image recognition system. We will also perform an experiment on two popular image features: SIFT and HOG. Finally, a performance of both preliminary system and improved system will be evaluated. Categories and Subject Descriptors I.4.9 [Image processing and Computer vision]: Application General Terms Management, Design, Experimentation, Human Factors Keywords Image recognition, crowd sourcing 1. INTRODUCTION Every day, thousands of images are tagged and posted on social media websites. To date, nobody in the computer vision community has tackled analyzing this data, despite the fact that there have been numerous attempts to crowd-source image labeling for general object recognition. We would like to make image classifiers for popular images on social media sites, such as Tumblr and Reddit. For the scope of this class project, we will first focus on only 2 of the popular animals on the Internet: the cat and the dog. 2. DATASET Our datasets are mainly images with a label. Currently we consider only 2 labels: kitten and dog. The each image class will be stored in separate directory. We will use 3 datasets in this project. 2.1 Tumblr Image Dataset Tumblr is a social media website that its users post large amount of multimedia content. In addition, the users usually also provide tags for the their posts which can be potentially learned. We have scraped the images with the targeted tags (kitten and dog). Currently, we have 4777 images tagged with #dog and 27015 images tagged with #kitten. We will also need another dataset of objects negative examples for training; we are investigating #earthporn, #nicolas-cage, #rage-comics and others. Our initial classifier will be trained using Tumblr datasets. Despite the large amount of data available, tags can sometimes be noisy. For example, certain images tagged #doggie do not have dogs, but something wildly inappropriate. This sort of noise is to be expected of real world data. Despite the noise, the volume of data is enticing, and is orders of magnitude larger than the datasets typically studied by researchers in computer vision. 2.2 Publication Dataset When we have to test our system performance, highly accurate labeled image dataset is required. Thus, we are going to use images from published source. In this project, we've found image database for dog and cat images. We also have obtained a database of pictures of negative examples. The datasets are: · Stanford dog dataset (20580 images) · Pascal 2007 cat dataset (9997 images) · Stanford 40 images of actions (9532) The Stanford dog dataset was originally intended to for learning to classify dog breeds [1], so it is not a suitable point of comparison. The authors of the cat dataset do not compute statistics suitable for proving performance improvement [2]. In particular, the authors look at ROC curves, and precision versus recall curves for two competitions. While these curves allow for a point of comparison, we cannot demonstrate statistically whether our approach outperforms the authors. 2.3 Crowd sourcing Dataset Finally we want to get labeled images from users as annotators. Users will have an easy way to label images while they enjoy watching their favourite image type. We will retrieve images from Reddit, another popular social media website, and let the users label them. This dataset will be use to train our classifier in order to improve the overall performance of the system. We do not anticipate getting a lot of users, so we hope that our friends can be of assistance. Generally, people have responded positively to the task “look at pictures of cats”. 1 http://vision.stanford.edu/aditya86/ImageNetDogs/ 2 http://137.189.35.203/WebUI/CatDatabase/catData.html 3 http://vision.stanford.edu/Datasets/40actions.html Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Conference’10, Month 1–2, 2010, City, State, Country. Copyright 2010 ACM 1-58113-000-0/00/0010 ...$15.00. 3. PROBLEM DEFINITION 3.1 Task We will predict the label of a given image whether it is cat or dog. Our system will have 2 image classifiers. First, the offline classifier that we'll train using images scraped from Tumblr (Section 2.1). Then, we will have another classifier on a mobile application using the offline classifier as a starting point. Our system will fetch an image on the internet (mainly on Reddit), classify the image and let a user verify to incrementally improve the system. For this classification task, we are going to use SVM to output the image label, given a bag of words of an image feature. First, a raw image will be extracted its feature. We are considering 2 image feature extraction algorithms: Scale-invariant feature transform (SIFT) and Histogram of oriented gradients (HOG). Then the feature vector will be quantized k-means clustering, and then retrieved online using a KD-Tree (constructed during a training phase) to get the bag of word, and feed into SVM to predict the label. 3.2 Evaluation To measure performance, we intend to look at both F1 score and sensitivity. The F1 score is defined to be: Sensitivity is defined as the frequency in which a true positive is correctly identified. The reason that sensitivity is chosen as a performance measure is because in the offline training, we will want a classifier that identifies cats at the expense of some false alarms. The user will then have a job to do in training the system to not make as many type II errors. For ground truth evaluation, we will use the Stanford Dog dataset, the cat dataset, and the Stanford 40 action images, as discussed in Section 2.2. Our points of comparison will be: · Using HOG features instead of SIFT and vice versa · Number of clusters for quantization · Performance of offline classifier versus online trained classifier We intend to have at least 40 datapoints for each experiment, and we propose to use non-parametric statistical tests to check for difference in performance (sign-test and Kolmogorov-Smirnov). 4. IMPLEMENTATION PLAN 4.1 Our software We will write the following software: · Scripts to scrape tumblr for images · Scripts to extract features and learn classifiers using QUEST · An iOS application 4.2 Existing Software 4.2.1 Overview All of our software is available via github: https://github.com/xcthulhu/EECS349 For this work, we are using the QUEST cluster at Northwestern campus for offline training and XCode v4.5 for iOS development. Apple iOS development is tested on a iPad Rev 3 and an iPhone 4. 4.2.2 Offline Algorithms For our offline training, we make extensive use of Python v2.7.3. We will make heavy use of the scikit-‐learn python module [3]. In particular, we make use of the scikit-‐learn to perform: · Feature vector quantization (via iterated k-means clustering) · Feature selection · KD-Tree construction · SVM classification We discuss how this software is used in subsequent sections. We are investigating use LaSVM as an alternative to scikit-‐learn for SVM training, and FLANN as an alternative to scikit-‐ learn's KD-Tree implementation [4]. Likewise, we are investigating the use of the Shogun machine learning toolbox, as it outperforms scikit-learn on k-means clustering benchmarks. In addition to scikit-‐learn, we use the scikit-‐image python module, and a customized version of the python-‐ tumblr module (available in our github repository). All python software is sandboxed using virtualenv. Apart from python, we will make use of libsiftfast for extracting SIFT image features [5]. 4.2.3 Online Algorithms Online feature extraction will be performed on iOS devices using the sift-‐gpu-‐iphone framework. Online nearest neighbor using KD-Trees will be implemented using OpenCV, using the official iOS framework. Finally, online SVM training will be carried out using LaSVM. 4.3 Image Classification Learning 4.3.1 Overview To classify images, we use the bag of words approach. This involves: · Raw Image Feature Extraction · Feature vector quantization, producing histograms of features · Feature selection, reducing the dimensionality of feature histogram · Classifier Training 4 http://scikit-learn.org/stable/ 5 http://leon.bottou.org/projects/lasvm 6 http://www.cs.ubc.ca/~mariusm/index.php/FLANN/FLANN 7 http://www.shogun-toolbox.org 8 http://scikit-image.org 9 http://pypi.python.org/pypi/python-tumblpy/0.3.1 10 http://pypi.python.org/pypi/virtualenv 11 http://sourceforge.net/projects/libsift/ 12 https://github.com/Moodstocks/sift-gpu-iphone 13 http://leon.bottou.org/projects/lasvm To speed up online image classification, a KD-Tree is also constructed. 4.3.2 Raw Image Feature Extraction In this project we study the use of two different feature selection systems: HOG features and SIFT features. HOG features are implemented in scikit-‐image. SIFT features are implemented in libsiftfast, which is suitable for offline use. For online use, we intend to make use of a fast, GPU based implementation of SIFT can is available for iOS. 4.3.3 Feature Quantization Image quantization is popularly employed in image analysis; for instance, see [6] for a discussion of the relative merits of feature quantization versus other approaches in image recognition. To compute feature quantization, we use k-means clustering [7]. To speed computation, we make use of MiniBatch K-means [8] (a probabilistic variation of k-means that uses a smaller memory footprint and has faster convergence). To further distribute feature quantization, we employ a variation on the method from [9]. Briefly, this algorithm involves: · Cluster the data into a small number clusters (from 10 to 100) · Rebin the data into the computed clusters · Call slaves to compute new clusters within each bin This method, while simple, effectively exploits the computational resources available using the QUEST cluster. K-means and MiniBatch K-means are implemented in scikit-‐ learn, although we note that the implementation in the Shogun toolbox is reported to be faster by a factor of 2. 4.3.4 Improving Online Histogram Calculation The ultimate purpose of feature quantization is to compute histograms, which are known as a bag of words. However, given an image with N features, and M quantized clusters, computing this bag of words using a naïve algorithm take O(N×M). For large values of N and M, this calculation can become prohibitive, preventing reasonable online performance. One solution to this problem is to use a KD-Tree [10]. Querying a KD-Tree of M points for an approximate nearest neighbor can be done in O(log(M)). An implementation of this is available in the Fast Library of Approximate Nearest Neighbors [4]. Other implementations include scikit-‐learn and opencv. 4.3.5 Feature Selection Another, simple and import way to improve bag of words classification is to eschew high frequency and low frequency features. There are a number of approaches to this: · Univariate feature selection · Recursive feature selection · L1-based feature selection These selection techniques are implemented in scikit-‐learn. 4.3.6 Support Vector Machine Training Our principal learning model are Support Vector Machines. As mentioned previously, scikit-‐learn provides one implementation (providing front-ends to both liblinear and libsvm). Alternatively, one can use LaSVM instead. We intend to use LaSVM for the online platform. 4.4 Milestones Table 1. Milestone plan Milestone Member( s) Date Download Images Thanapon November 12 Comparison of SIFT and HOG features using offline training Thanapon, Matthew November 16 Implementation of KD-Trees for performance enhancement Matthew November 16 iOS alpha (something that classifies a cat, a dog, and a negative example using SVM) Thanapon, Matthew November 23 iOS beta (downloads an image from reddit, uses KD-Tree) Thanapon, Matthew November 27 Have crowd sourced labeled data (from friends if possible) Thanapon, Matthew December 2 Comparison of Offline and Online Performance Thanapon, Matthew December 7 5. RELATED WORK · Distributed clustering algorithms suitable for grid computing are discussed in [11]. · HOG features are presented in [12]. SIFT features are presented in [5]. · Other crowd-sourced efforts image labelling efforts of note include [13], [14], and [15]. · One recent paper of note where large amounts of imageswere scraped from the Internet for machine learning is [16]. 6. REFERENCES[1] A. Khosla, N. Jayadevaprakash, B. Yao, and L. Fei-Fei,“Novel Dataset for Fine-Grained Image Categorization,” inFirst Workshop on Fine-Grained Visual Categorization,IEEE Conference on Computer Vision and PatternRecognition, Colorado Springs, CO, 2011. [2] W. Zhang, J. Sun, and X. Tang, “Cat Head Detection Howto Effectively Exploit Shape and Texture Features,” in ECCV(4), 2008, pp. 802–816. [3] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B.Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V.Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M.Brucher, M. Perrot, and É. Duchesnay, “Scikit-learn:Machine Learning in Python,” J. Mach. Learn. Res., vol. 12,pp. 2825–2830, Nov. 2011. [4] M. Muja and D. G. Lowe, “Fast Approximate NearestNeighbors with Automatic Algorithm Configuration,” inInternational Conference on Computer Vision Theory andApplication VISSAPP’09), 2009, pp. 331–340. [5] D. G. Lowe, “Object recognition from local scale-invariantfeatures,” in The Proceedings of the Seventh IEEEInternational Conference on Computer Vision, 1999, 1999,vol. 2, pp. 1150 –1157 vol.2. [6] M. Faundez-Zanuy, “On-line signature recognition based onVQ-DTW,” Pattern Recognition, vol. 40, no. 3, pp. 981–992,Mar. 2007. [7] J. MacQueen, “Some methods for classification and analysisof multivariate observations,” in Proceedings of the fifthBerkeley symposium on mathematical statistics andprobability, 1967, vol. 1, p. 14. [8] D. Sculley, “Web-scale k-means clustering,” in Proceedingsof the 19th international conference on World wide web,2010, pp. 1177–1178. [9] S. Kantabutra and A. L. Couch, “Parallel K-means clusteringalgorithm on NOWs,” NECTEC Technical journal, vol. 1,no. 6, pp. 243–247, 2000. [10] J. L. Bentley, “Multidimensional binary search trees used forassociative searching,” Commun. ACM, vol. 18, no. 9, pp.509–517, Sep. 1975. [11] B. Boutsinas and T. Gnardellis, “On distributing theclustering process,” Pattern Recognition Letters, vol. 23, no.8, pp. 999–1008, Jun. 2002. [12] N. Dalal and B. Triggs, “Histograms of oriented gradients forhuman detection,” in Computer Vision and PatternRecognition, 2005. CVPR 2005. IEEE Computer SocietyConference on, 2005, vol. 1, pp. 886–893. [13] L. von Ahn and L. Dabbish, “Labeling images with acomputer game,” in Proceedings of the SIGCHI Conferenceon Human Factors in Computing Systems, New York, NY,USA, 2004, pp. 319–326. [14] B. Russell, A. Torralba, K. Murphy, and W. Freeman,“LabelMe: A Database and Web-Based Tool for ImageAnnotation,” International Journal of Computer Vision, vol.77, no. 1, pp. 157–173, 2008. [15] S. Mavandadi, S. Dimitrov, S. Feng, F. Yu, U. Sikora, O.Yaglidere, S. Padmanabhan, K. Nielsen, and A. Ozcan,“Distributed Medical Image Analysis and Diagnosis throughCrowd-Sourced Games: A Malaria Case Study,” PLoS ONE,vol. 7, no. 5, p. e37245, May 2012. [16] Q. V. Le, M. Ranzato, R. Monga, M. Devin, K. Chen, G. S.Corrado, J. Dean, and A. Y. Ng, “Building high-levelfeatures using large scale unsupervised learning,”arXiv:1112.6209, Dec. 2011
منابع مشابه
The relationship between attitudes toward cosmetic surgery and social media addiction based on the mediating role of body image anxiety in adolescents
This study aimed to investigate the mediating role of body image anxiety in the relationship between attitudes toward cosmetic surgery and social media addiction in adolescents. The design of the study was descriptive-correlational by path analysis. To achieve the objectives of the research, among all adolescents aged 14 to 18 years old in Tabriz schools in 1400, a sample of 357 people was sele...
متن کاملPredicting Body Image Concerns, Social Isolation, and Mood by the Amount of Social Media Addiction
Objective: The use of the Internet is widely increasing among the new generation, shaping an important aspect of people's lives. The use of the social media can influence body image concerns, social isolation, and social mood. The purpose of the present study is to assess body image concerns, social isolation, and mood based on the amount of social media use. Method: This study has been conduc...
متن کاملPredictive Tagging of Social Media Images using Unsupervised Learning
The popularity of online social media has provided a huge repository of multimedia contents. To effectively retrieve and store this multimedia content and to mine useful pattern from this data is a herculean task. This paper deals with the problems of social image tagging. Multimedia tagging i.e. assigning tags or some keywords to multimedia contents like images, audio, video etc. by users is r...
متن کاملSocial Media in Public Libraries: Recognition of Applications, Obstacles and Problems of Use
Background and Aim: Social media because of its interactive nature and the fact that it is being free of charge is widely used in libraries. Web 2.0 is a tool that offers permanent connection every time and offers educational programs without limitations of place and time. But what is included in social media application in public libraries and what obstacles and problems are there in the way...
متن کاملSimilarity measurement for describe user images in social media
Online social networks like Instagram are places for communication. Also, these media produce rich metadata which are useful for further analysis in many fields including health and cognitive science. Many researchers are using these metadata like hashtags, images, etc. to detect patterns of user activities. However, there are several serious ambiguities like how much reliable are these informa...
متن کاملA New Approach towards Precise Planar Feature Characterization Using Image Analysis of FMI Image: Case Study of Gachsaran Oil Field Well No. 245, South West of Iran
Formation micro imager (FMI) can directly reflect changes of wall stratums and rock structures. Conventionally, FMI images mainly are analyzed with manual processing, which is extremely inefficient and incurs a heavy workload for experts. Iranian reservoirs are mainly carbonate reservoirs, in which the fractures have an important effect on permeability and petroleum production. In this paper, a...
متن کامل